AITopics | local elasticity

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Neural Information Processing SystemsApr-25-2026, 09:39:50 GMT

Understanding the training dynamics of deep learning models is perhaps a necessary step toward demystifying the effectiveness of these models. In particular, how do data from different classes gradually become separable in their feature spaces when training neural networks using stochastic gradient descent?

artificial intelligence, machine learning, neural network, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

b6b90237b3ebd1e462a5d11dbc5c4dae-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 23:28:16 GMT

arxiv preprint arxiv, kernel, neural network, (10 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

327af0f71f7acdfd882774225f04775f-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 03:57:11 GMT

international conference, local elasticity, neural network, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)

Add feedback

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Neural Information Processing SystemsDec-23-2025, 23:31:16 GMT

Understanding the training dynamics of deep learning models is perhaps a necessary step toward demystifying the effectiveness of these models. In particular, how do training data from different classes gradually become separable in their feature spaces when training neural networks using stochastic gradient descent?

elastic stochastic differential equation, imitating deep learning dynamic, name change, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.59)

Add feedback

b6b90237b3ebd1e462a5d11dbc5c4dae-Paper.pdf

Neural Information Processing SystemsAug-16-2025, 00:04:39 GMT

arxiv preprint arxiv, kernel, neural network, (9 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Pennsylvania (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Neural Information Processing SystemsOct-9-2024, 23:57:31 GMT

Understanding the training dynamics of deep learning models is perhaps a necessary step toward demystifying the effectiveness of these models. In particular, how do training data from different classes gradually become separable in their feature spaces when training neural networks using stochastic gradient descent? As a crucial ingredient in our modeling strategy, each SDE contains a drift term that reflects the impact of backpropagation at an input on the features of all samples. Our main finding uncovers a sharp phase transition phenomenon regarding the intra-class impact: if the SDEs are locally elastic in the sense that the impact is more significant on samples from the same class as the input, the features of training data become linearly separable---meaning vanishing training loss; otherwise, the features are not separable, no matter how long the training time is. In the presence of local elasticity, moreover, an analysis of our SDEs shows the emergence of a simple geometric structure called neural collapse of the features.

elastic stochastic differential equation, imitating deep learning dynamic, local elasticity, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

Elephant Neural Networks: Born to Be a Continual Learner

Lan, Qingfeng, Mahmood, A. Rupam

arXiv.org Artificial IntelligenceOct-2-2023

Catastrophic forgetting remains a significant challenge to continual learning for decades. While recent works have proposed effective methods to mitigate this problem, they mainly focus on the algorithmic side. Meanwhile, we do not fully understand what architectural properties of neural networks lead to catastrophic forgetting. This study aims to fill this gap by studying the role of activation functions in the training dynamics of neural networks and their impact on catastrophic forgetting. Our study reveals that, besides sparse representations, the gradient sparsity of activation functions also plays an important role in reducing forgetting. Based on this insight, we propose a new class of activation functions, elephant activation functions, that can generate both sparse representations and sparse gradients. We show that by simply replacing classical activation functions with elephant activation functions, we can significantly improve the resilience of neural networks to catastrophic forgetting. Our method has broad applicability and benefits for continual learning in regression, class incremental learning, and reinforcement learning tasks. Specifically, we achieves excellent performance on Split MNIST dataset in just one single pass, without using replay buffer, task boundary information, or pre-training. One of the biggest challenges to achieving continual learning is the decades-old issue of catastrophic forgetting (French 1999). Catastrophic forgetting stands for the phenomenon that artificial neural networks tend to forget prior knowledge drastically when learned with stochastic gradient descent algorithms on non-independent and identically distributed (non-iid) data.

activation function, learning, neural network, (13 more...)

arXiv.org Artificial Intelligence

2310.01365

Country:

North America > Canada > Alberta (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Investigating the locality of neural network training dynamics

Dan, Soham, Gampa, Phanideep, Mukherjee, Anirbit

arXiv.org Machine LearningNov-1-2021

A fundamental quest in the theory of deep-learning is to understand the properties of the trajectories in the weight space that a learning algorithm takes. One such property that had very recently been isolated is that of "local elasticity" ($S_{\rm rel}$), which quantifies the propagation of influence of a sampled data point on the prediction at another data point. In this work, we perform a comprehensive study of local elasticity by providing new theoretical insights and more careful empirical evidence of this property in a variety of settings. Firstly, specific to the classification setting, we suggest a new definition of the original idea of $S_{\rm rel}$. Via experiments on state-of-the-art neural networks training on SVHN, CIFAR-10 and CIFAR-100 we demonstrate how our new $S_{\rm rel}$ detects the property of the weight updates preferring to make changes in predictions within the same class of the sampled data. Next, we demonstrate via examples of neural nets doing regression that the original $S_{\rm rel}$ reveals a $2-$phase behaviour: that their training proceeds via an initial elastic phase when $S_{\rm rel}$ changes rapidly and an eventual inelastic phase when $S_{\rm rel}$ remains large. Lastly, we give multiple examples of learning via gradient flows for which one can get a closed-form expression of the original $S_{\rm rel}$ function. By studying the plots of these derived formulas we given a theoretical demonstration of some of the experimentally detected properties of $S_{\rm rel}$ in the regression setting.

experiment, neural network, time evolution, (12 more...)

arXiv.org Machine Learning

2111.01166

Country:

North America > United States > Pennsylvania (0.04)
Asia > India (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Zhang, Jiayao, Wang, Hua, Su, Weijie J.

arXiv.org Machine LearningOct-11-2021

Understanding the training dynamics of deep learning models is perhaps a necessary step toward demystifying the effectiveness of these models. In particular, how do data from different classes gradually become separable in their feature spaces when training neural networks using stochastic gradient descent? In this study, we model the evolution of features during deep learning training using a set of stochastic differential equations (SDEs) that each corresponds to a training sample. As a crucial ingredient in our modeling strategy, each SDE contains a drift term that reflects the impact of backpropagation at an input on the features of all samples. Our main finding uncovers a sharp phase transition phenomenon regarding the {intra-class impact: if the SDEs are locally elastic in the sense that the impact is more significant on samples from the same class as the input, the features of the training data become linearly separable, meaning vanishing training loss; otherwise, the features are not separable, regardless of how long the training time is. Moreover, in the presence of local elasticity, an analysis of our SDEs shows that the emergence of a simple geometric structure called the neural collapse of the features. Taken together, our results shed light on the decisive role of local elasticity in the training dynamics of neural networks. We corroborate our theoretical analysis with experiments on a synthesized dataset of geometric shapes and CIFAR-10.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

2110.0596

Country:

North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Label-Aware Neural Tangent Kernel: Toward Better Generalization and Local Elasticity

Chen, Shuxiao, He, Hangfeng, Su, Weijie J.

arXiv.org Machine LearningOct-29-2020

As a popular approach to modeling the dynamics of training overparametrized neural networks (NNs), the neural tangent kernels (NTK) are known to fall behind real-world NNs in generalization ability. This performance gap is in part due to the \textit{label agnostic} nature of the NTK, which renders the resulting kernel not as \textit{locally elastic} as NNs~\citep{he2019local}. In this paper, we introduce a novel approach from the perspective of \emph{label-awareness} to reduce this gap for the NTK. Specifically, we propose two label-aware kernels that are each a superimposition of a label-agnostic part and a hierarchy of label-aware parts with increasing complexity of label dependence, using the Hoeffding decomposition. Through both theoretical and empirical evidence, we show that the models trained with the proposed kernels better simulate NNs in terms of generalization ability and local elasticity.

artificial intelligence, kernel, machine learning, (13 more...)

arXiv.org Machine Learning

2010.11775

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Pennsylvania (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Filters

Collaborating Authors

local elasticity

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

b6b90237b3ebd1e462a5d11dbc5c4dae-Paper.pdf

327af0f71f7acdfd882774225f04775f-Paper.pdf

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

b6b90237b3ebd1e462a5d11dbc5c4dae-Paper.pdf

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Elephant Neural Networks: Born to Be a Continual Learner

Investigating the locality of neural network training dynamics

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Label-Aware Neural Tangent Kernel: Toward Better Generalization and Local Elasticity